Direct Orthographical Mapping for Machine Transliteration
نویسندگان
چکیده
Machine transliteration/back-transliteration plays an important role in many multilingual speech and language applications. In this paper, a novel framework for machine transliteration/backtransliteration that allows us to carry out direct orthographical mapping (DOM) between two different languages is presented. Under this framework, a joint source-channel transliteration model, also called n-gram transliteration model (ngram TM), is further proposed to model the transliteration process. We evaluate the proposed methods through several transliteration/backtransliteration experiments for English/Chinese and English/Japanese language pairs. Our study reveals that the proposed method not only reduces an extensive system development effort but also improves the transliteration accuracy significantly.
منابع مشابه
A Modified Joint Source-Channel Model for Transliteration
Most machine transliteration systems transliterate out of vocabulary (OOV) words through intermediate phonemic mapping. A framework has been presented that allows direct orthographical mapping between two languages that are of different origins employing different alphabet sets. A modified joint source–channel model along with a number of alternatives have been proposed. Aligned transliteration...
متن کاملA Joint Source-Channel Model for Machine Transliteration
Most foreign names are transliterated into Chinese, Japanese or Korean with approximate phonetic equivalents. The transliteration is usually achieved through intermediate phonemic mapping. This paper presents a new framework that allows direct orthographical mapping (DOM) between two different languages, through a joint source-channel model, also called n-gram transliteration model (TM). With t...
متن کاملCombining a Two-step Conditional Random Field Model and a Joint Source Channel Model for Machine Transliteration
This paper describes our system for “NEWS 2009 Machine Transliteration Shared Task” (NEWS 2009). We only participated in the standard run, which is a direct orthographical mapping (DOP) between two languages without using any intermediate phonemic mapping. We propose a new two-step conditional random field (CRF) model for DOP machine transliteration, in which the first CRF segments a source wor...
متن کاملJointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm
This paper presents a joint optimization method of a two-step conditional random field (CRF) model for machine transliteration and a fast decoding algorithm for the proposed method. Our method lies in the category of direct orthographical mapping (DOM) between two languages without using any intermediate phonemic mapping. In the two-step CRF model, the first CRF segments an input word into chun...
متن کاملMachine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information
Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. Machine transliteration can play an important role in natural language application such as information retrieval and machine translation, especially for handling proper nouns and technical terms. The previous works focus on ei...
متن کامل